{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "In real world, there exists many huge graphs that can not be loaded in one machine, \n",
    "such as social networks and citation networks.\n",
    "\n",
    "To deal with such graphs, PGL develops a Distributed Graph Engine Framework to \n",
    "support graph sampling on large scale graph networks for distributed GNN training.\n",
    "\n",
    "In this tutorial, we will walk through the steps of performing distributed Graph Engine for graph sampling. \n",
    "\n",
    "We also develop a launch script for launch a distributed Graph Engine. To see more examples of distributed GNN training, please refer to [here](https://github.com/PaddlePaddle/PGL/tree/main/examples).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Requirements\n",
    "\n",
    "paddlepaddle>=2.1.0\n",
    "\n",
    "pgl>=2.1.4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## example of how to start a distributed graph engine service\n",
    "\n",
    "Supose we have a following graph that has two type of nodes (u and t).\n",
    "\n",
    "Firstly, We should create a configuration file and specify the ip address of each machine. \n",
    "Here we use two ports to simulate two machines.\n",
    "\n",
    "After creating the configuration file and ip adress file, we can now start two graph servers.\n",
    "\n",
    "Then we can use the client to sample neighbors or sample nodes from graph servers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/ssd2/liweibin02/projects/pgl_dygraph/pgl/distributed/__init__.py:26: UserWarning: The Distributed Graph Engine is experimental, we will officially release it soon\n",
      "  \"The Distributed Graph Engine is experimental, we will officially release it soon\"\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import sys\n",
    "import re\n",
    "import time\n",
    "import tqdm\n",
    "import argparse\n",
    "import unittest\n",
    "import shutil\n",
    "import numpy as np\n",
    "\n",
    "from pgl.utils.logger import log\n",
    "\n",
    "from pgl.distributed import DistGraphClient, DistGraphServer\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "edges_file = \"\"\"37\t45\t0.34\n",
    "37\t145\t0.31\n",
    "37\t112\t0.21\n",
    "96\t48\t1.4\n",
    "96\t247\t0.31\n",
    "96\t111\t1.21\n",
    "59\t45\t0.34\n",
    "59\t145\t0.31\n",
    "59\t122\t0.21\n",
    "97\t48\t0.34\n",
    "98\t247\t0.31\n",
    "7\t222\t0.91\n",
    "7\t234\t0.09\n",
    "37\t333\t0.21\n",
    "47\t211\t0.21\n",
    "47\t113\t0.21\n",
    "47\t191\t0.21\n",
    "34\t131\t0.21\n",
    "34\t121\t0.21\n",
    "39\t131\t0.21\"\"\"\n",
    "\n",
    "node_file = \"\"\"u\t98\n",
    "u\t97\n",
    "u\t96\n",
    "u\t7\n",
    "u\t59\n",
    "t\t48\n",
    "u\t47\n",
    "t\t45\n",
    "u\t39\n",
    "u\t37\n",
    "u\t34\n",
    "t\t333\n",
    "t\t247\n",
    "t\t234\n",
    "t\t222\n",
    "t\t211\n",
    "t\t191\n",
    "t\t145\n",
    "t\t131\n",
    "t\t122\n",
    "t\t121\n",
    "t\t113\n",
    "t\t112\n",
    "t\t111\"\"\"\n",
    "\n",
    "\n",
    "tmp_path = \"./tmp_distgraph_test\"\n",
    "if not os.path.exists(tmp_path):\n",
    "    os.makedirs(tmp_path)\n",
    "\n",
    "with open(os.path.join(tmp_path, \"edges.txt\"), 'w') as f:\n",
    "    f.write(edges_file)\n",
    "\n",
    "with open(os.path.join(tmp_path, \"node_types.txt\"), 'w') as f:\n",
    "    f.write(node_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# configuration file\n",
    "config = \"\"\"\n",
    "etype2files: \"u2e2t:./tmp_distgraph_test/edges.txt\"\n",
    "symmetry: True\n",
    "\n",
    "ntype2files: \"u:./tmp_distgraph_test/node_types.txt,t:./tmp_distgraph_test/node_types.txt\"\n",
    "\n",
    "\"\"\"\n",
    "\n",
    "ip_addr = \"\"\"127.0.0.1:8342\n",
    "127.0.0.1:8343\"\"\"\n",
    "\n",
    "\n",
    "with open(os.path.join(tmp_path, \"config.yaml\"), 'w') as f:\n",
    "    f.write(config)\n",
    "    \n",
    "with open(os.path.join(tmp_path, \"ip_addr.txt\"), 'w') as f:\n",
    "    f.write(ip_addr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/ssd2/liweibin02/projects/pgl_dygraph/pgl/distributed/helper.py:60: UserWarning: nfeat_info attribute is not existed, return None\n",
      "  warnings.warn(\"%s attribute is not existed, return None\" % attr)\n"
     ]
    }
   ],
   "source": [
    "\n",
    "config = os.path.join(tmp_path, \"config.yaml\")\n",
    "\n",
    "ip_addr = os.path.join(tmp_path, \"ip_addr.txt\")\n",
    "shard_num = 10\n",
    "gserver1 = DistGraphServer(config, shard_num, ip_addr, server_id=0)\n",
    "gserver2 = DistGraphServer(config, shard_num, ip_addr, server_id=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/ssd2/liweibin02/projects/pgl_dygraph/pgl/distributed/helper.py:60: UserWarning: node_batch_stream_shuffle_size attribute is not existed, return None\n",
      "  warnings.warn(\"%s attribute is not existed, return None\" % attr)\n",
      "/ssd2/liweibin02/projects/pgl_dygraph/pgl/distributed/dist_graph.py:172: UserWarning: node_batch_stream_shuffle_size is not specified, default value is 20000\n",
      "  warnings.warn(\"node_batch_stream_shuffle_size is not specified, \"\n",
      "[INFO] 2021-06-18 18:56:30,655 [dist_graph.py:  200]:\tload edges of type u2e2t from ./tmp_distgraph_test/edges.txt\n",
      "[INFO] 2021-06-18 18:56:30,658 [dist_graph.py:  210]:\tload nodes of type u from ./tmp_distgraph_test/node_types.txt\n",
      "[INFO] 2021-06-18 18:56:30,658 [dist_graph.py:  210]:\tload nodes of type t from ./tmp_distgraph_test/node_types.txt\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "data loading finished\n"
     ]
    }
   ],
   "source": [
    "client1 = DistGraphClient(config, shard_num=shard_num, ip_config=ip_addr, client_id=0)\n",
    "\n",
    "client1.load_edges()\n",
    "client1.load_node_types()\n",
    "print(\"data loading finished\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[34, 59, 98]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# random sample nodes by node type\n",
    "client1.random_sample_nodes(node_type=\"u\", size=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[333, 111, 121]\n",
      "[234, 113, 191]\n",
      "[131, 122, 222]\n",
      "[211, 112]\n",
      "[45, 145, 247]\n",
      "[48]\n"
     ]
    }
   ],
   "source": [
    "# traverse all nodes from each server\n",
    "node_generator = client1.node_batch_iter(batch_size=3, node_type=\"t\", shuffle=True)\n",
    "for nodes in node_generator:\n",
    "    print(nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[247], [222, 234]]\n"
     ]
    }
   ],
   "source": [
    "# sample neighbors\n",
    "# note that the edge_type \"u2eut\" is defined in config.yaml file\n",
    "nodes = [98, 7]\n",
    "neighs = client1.sample_successor(nodes, max_degree=10, edge_type=\"u2e2t\")\n",
    "print(neighs)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "paddle16",
   "language": "python",
   "name": "paddle16"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
